R Data Analysis Cookbook by Unknown

R Data Analysis Cookbook by Unknown

Author:Unknown
Language: eng
Format: mobi, epub
Publisher: Packt Publishing


How it works...

In step 1 the data is read and in step 2 we define the convenience function for scaling a set of variables in a data frame.

In step 3 the convenience function is used to scale only the variables of interest. We leave out the No, model_year, and car_name variables.

In step 4 the distance matrix is created based on the standardized values of the relevant variables. We have computed Euclidean distances; other possibilities are: maximum, manhattan, canberra, binary, and minkowski.

In step 5 the distance matrix is passed to the hclust function to create the clustering model. We specified method = "ward" to use Ward's method, which tries to get compact spherical clusters. The hclust function also supports single, complete, average, mcquitty, median, and centroid.

In step 6 the resulting dendrogram is plotted. We specified labels=FALSE because we have too many cases and printing them will only add clutter. With a smaller dataset, using labels = TRUE will make sense. The hang argument controls the distance from the bottom of the dendrogram to the labels. Since we are not using labels, we specified hang = 0 to prevent numerous vertical lines below the dendrogram.

The dendrogram shows all the cases at the bottom (too numerous to distinguish in our plot) and shows the step-by-step agglomeration of the clusters. The dendrogram is organized in such a way that we can obtain a desired set of clusters, say K, by drawing a horizontal line in such a way that it cuts across exactly K vertical lines on the dendrogram.

Step 7 show how to use the rect.hclust function to demarcate the cases comprising the various clusters for a selected value of k.

Step 8 shows how we can use the cutree function to identify, for a specific K, which cluster each case of our data belongs to.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.